Log Data Processing Project - Milestone Tracker

The project is focused on the streaming and analysis of live or near-live log data utilizing a diverse set of AWS services.
Version-01
Version-02
A consumer script placed in EC2 Instance tracks any new data in Kinesis Data Stream (KDS) and sends PutItem API requests to a DynamoDB table.

Version-03
To achieve a more optimal and efficient solution, the consumer script that was previously placed in the EC2 Instance has been replaced with a Lambda function.

Version-04
Data analysis has been performed using EMR Cluster, leveraging Apache Spark and MLlib.

Version-05
The schema of the Data stored in S3 was determined using Glue, and Athena was used to query the data based on the extracted schema.

Version-06
For the purpose of exploring analysis using Redshift instead of Athena, Redshift Spectrum was utilized to treat S3 data as a new table in Redshift. This was achieved through the help of the external schema command

Version-07
Kinesis Data Analytics was employed to analyze real-time data, while Lambda was integrated with SNS to send SMS messages in case of any suspicious events.